Goto

Collaborating Authors

 metric entropy



Does Flatness imply Generalization for Logistic Loss in Univariate Two-Layer ReLU Network?

Qiao, Dan, Wang, Yu-Xiang

arXiv.org Machine Learning

We consider the problem of generalization of arbitrarily overparameterized two-layer ReLU Neural Networks with univariate input. Recent work showed that under square loss, flat solutions (motivated by flat / stable minima and Edge of Stability phenomenon) provably cannot overfit, but it remains unclear whether the same phenomenon holds for logistic loss. This is a puzzling open problem because existing work on logistic loss shows that gradient descent with increasing step size converges to interpolating solutions (at infinity, for the margin-separable cases). In this paper, we prove that the \emph{flatness implied generalization} is more delicate under logistic loss. On the positive side, we show that flat solutions enjoy near-optimal generalization bounds within a region between the left-most and right-most \emph{uncertain} sets determined by each candidate solution. On the negative side, we show that there exist arbitrarily flat yet overfitting solutions at infinity that are (falsely) certain everywhere, thus certifying that flatness alone is insufficient for generalization in general. We demonstrate the effects predicted by our theory in a well-controlled simulation study.




Generalization Bounds in Hybrid Quantum-Classical Machine Learning Models

Wu, Tongyan, Bentellis, Amine, Sakhnenko, Alona, Lorenz, Jeanette Miriam

arXiv.org Artificial Intelligence

Hybrid classical-quantum models aim to harness the strengths of both quantum computing and classical machine learning, but their practical potential remains poorly understood. In this work, we develop a unified mathematical framework for analyzing generalization in hybrid models, offering insight into how these systems learn from data. We establish a novel generalization bound of the form $\tilde{\mathcal O}\left( \tfrac{α^{k}}{\sqrt{N}}\, \big( k^{\tfrac{3}{2}}\sqrt{m n}\;+\;\sqrt{T\log T}\big) \right)$ for $N$ training data points, $T$ trainable quantum gates, $n$ dimensional quantum circuit output, and $k$ bounded linear layers $ \|F_i\|_F \leq α$ where $ i = 1, \dots, k $ and $F_i \in \mathbb{R}^{m \times n} $ interspersed with activation functions. This generalization bound decomposes into quantum and classical contributions, providing a theoretical framework to separate their influence and clarifying their interaction. Alongside the bound, we highlight conceptual limitations of applying classical statistical learning theory in the hybrid setting and suggest promising directions for future theoretical work.


Approximation Rates of Shallow Neural Networks: Barron Spaces, Activation Functions and Optimality Analysis

Lu, Jian, Huang, Xiaohuang

arXiv.org Artificial Intelligence

This paper investigates the approximation properties of shallow neural networks with activation functions that are powers of exponential functions. It focuses on the dependence of the approximation rate on the dimension and the smoothness of the function being approximated within the Barron function space. We examine the approximation rates of ReLU$^{k}$ activation functions, proving that the optimal rate cannot be achieved under $\ell^{1}$-bounded coefficients or insufficient smoothness conditions. We also establish optimal approximation rates in various norms for functions in Barron spaces and Sobolev spaces, confirming the curse of dimensionality. Our results clarify the limits of shallow neural networks' approximation capabilities and offer insights into the selection of activation functions and network structures.



Partially Observable Reinforcement Learning with Memory Traces

Eberhard, Onno, Muehlebach, Michael, Vernade, Claire

arXiv.org Artificial Intelligence

Partially observable environments present a considerable computational challenge in reinforcement learning due to the need to consider long histories. Learning with a finite window of observations quickly becomes intractable as the window length grows. In this work, we introduce memory traces. Inspired by eligibility traces, these are compact representations of the history of observations in the form of exponential moving averages. We prove sample complexity bounds for the problem of offline on-policy evaluation that quantify the value errors achieved with memory traces for the class of Lipschitz continuous value estimates. We establish a close connection to the window approach, and demonstrate that, in certain environments, learning with memory traces is significantly more sample efficient. Finally, we underline the effectiveness of memory traces empirically in online reinforcement learning experiments for both value prediction and control.


Robust density estimation over star-shaped density classes

Liu, Xiaolong, Neykov, Matey

arXiv.org Machine Learning

We establish a novel criterion for comparing the performance of two densities, $g_1$ and $g_2$, within the context of corrupted data. Utilizing this criterion, we propose an algorithm to construct a density estimator within a star-shaped density class, $\mathcal{F}$, under conditions of data corruption. We proceed to derive the minimax upper and lower bounds for density estimation across this star-shaped density class, characterized by densities that are uniformly bounded above and below (in the sup norm), in the presence of adversarially corrupted data. Specifically, we assume that a fraction $\epsilon \leq \frac{1}{3}$ of the $N$ observations are arbitrarily corrupted. We obtain the minimax upper bound $\max\{ \tau_{\overline{J}}^2, \epsilon \} \wedge d^2$. Under certain conditions, we obtain the minimax risk, up to proportionality constants, under the squared $L_2$ loss as $$ \max\left\{ \tau^{*2} \wedge d^2, \epsilon \wedge d^2 \right\}, $$ where $\tau^* := \sup\left\{ \tau : N\tau^2 \leq \log \mathcal{M}_{\mathcal{F}}^{\text{loc}}(\tau, c) \right\}$ for a sufficiently large constant $c$. Here, $\mathcal{M}_{\mathcal{F}}^{\text{loc}}(\tau, c)$ denotes the local entropy of the set $\mathcal{F}$, and $d$ is the $L_2$ diameter of $\mathcal{F}$.


Which Spaces can be Embedded in $L_p$-type Reproducing Kernel Banach Space? A Characterization via Metric Entropy

Lu, Yiping, Lin, Daozhe, Du, Qiang

arXiv.org Machine Learning

In this paper, we establish a novel connection between the metric entropy growth and the embeddability of function spaces into reproducing kernel Hilbert/Banach spaces. Metric entropy characterizes the information complexity of function spaces and has implications for their approximability and learnability. Classical results show that embedding a function space into a reproducing kernel Hilbert space (RKHS) implies a bound on its metric entropy growth. Surprisingly, we prove a \textbf{converse}: a bound on the metric entropy growth of a function space allows its embedding to a $L_p-$type Reproducing Kernel Banach Space (RKBS). This shows that the ${L}_p-$type RKBS provides a broad modeling framework for learnable function classes with controlled metric entropies. Our results shed new light on the power and limitations of kernel methods for learning complex function spaces.